AML Project: American Sign Language Recogntition using Deep Learning

Implemented By:

  1. Sushant Kotwal(skotwal)
  2. Tejas Ram Ramesh(terame)
  3. Atharva Pandit(atpand)

For Implementing this project we used the following References:

  1. https://www.youtube.com/watch?v=6Bn0PY_ouBY
  2. https://www.youtube.com/watch?v=YjnGou4skGU
  3. https://www.youtube.com/watch?v=3hjsdfTVWRQ&t=699s
  4. HOML Book - By Aurelien Geron
  5. Deep Learning with Python - By Fancois Chollet
  6. https://towardsdatascience.com/american-sign-language-recognition-using-cnn-36910b86d651
  7. https://data-flair.training/blogs/sign-language-recognition-python-ml-opencv/

Note: Run this notebook in VisualStudio or any other IDE. The google colab does not support some of the functions of cv2 library needed to run this file

This can be said to be a balanced dataset. The number of labels are in the range 900 to slightly above 1200

Plotting the learning curves for our model

Here we saw that this model overfits on the data. The reason is because of the high number of neurons used in the CNN model. So now we add a dropout layer after each Pooling layer with a dropout rate 0f 30%

Learning Curves for the 2nd Model

From the learning curves, accuracy scores and classification report we see that this model fits very well on the data and we get a good accuracy for the model. So now we test this model on the live input from webcam

Testing on live input

Here we saw that our model performed very poorly on the live data. This is mainly because live data is very different from the static images that we trained on. There seemed to be no variation in data in terms of image depth, lighting, hand orientation and size. So to solve this problem we generate our own data for training.

Here we create our dataset for every hand gesture. We are creating training and testing data here itself. Since we are generating the images we can specify any number of images for hand gestures that we want. Thus there will be no problem of class imbalance here.

Now that tne images have been created, we further enhance this data by applying additional styling and transformation to the images. This involves shifting the image, rotating or flipping the image. We also convert the generated images to grayscale format since the captured images were in RGB format. So we need to convert them to grayscale before giving them as input to the CNN model.

After augmentation, a total of 50760 images were created having the specified styling and transformation for training the model and a total of 22744 images were created for validation set

Model Training on the Augmented Dataset

Now that the model has been trained on the real data we test our model